Summarization and Matching of Density-Based Clusters in Streaming Environments

نویسندگان

  • Di Yang
  • Elke A. Rundensteiner
  • Matthew O. Ward
چکیده

Density-based cluster mining is known to serve a broad range of applications ranging from stock trade analysis to moving object monitoring. Although methods for efficient extraction of density-based clusters have been studied in the literature, the problem of summarizing and matching of such clusters with arbitrary shapes and complex cluster structures remains unsolved. Therefore, the goal of our work is to extend the state-of-art of density-based cluster mining in streams from cluster extraction only to now also support analysis and management of the extracted clusters. Our work solves three major technical challenges. First, we propose a novel multi-resolution cluster summarization method, called Skeletal Grid Summarization (SGS), which captures the key features of density-based clusters, covering both their external shape and internal cluster structures. Second, in order to summarize the extracted clusters in real-time, we present an integrated computation strategy C-SGS, which piggybacks the generation of cluster summarizations within the online clustering process. Lastly, we design a mechanism to efficiently execute cluster matching queries, which identify similar clusters for given cluster of analyst’s interest from clusters extracted earlier in the stream history. Our experimental study using real streaming data shows the clear superiority of our proposed methods in both efficiency and effectiveness for cluster summarization and cluster matching queries to other potential alternatives.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Hybrid Summarization

One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...

متن کامل

XML Dissemination Scheme for Mobile Computing Based on Lineage Encoding

In wireless environments, broadcasting is an efficient and scalable method to broadcast information to a massive number of clients. We propose an energy and latency efficient XML dissemination scheme for the wireless mobile computing environments. This paper presents a novel unit structure called G-node for streaming XML data in the wireless system. It applies the benefits of the structure inde...

متن کامل

Dissemination of Xml Data in Wireless Environment Supporting Twig Pattern Queries

The main aim of this paper is to improve energy and latency efficiency of XML dissemination scheme for the mobile computing, which is based on Lineage Encoding, G-node and scheduling algorithm for streaming XML data in the wireless environment. In this paper we propose a new broadcasting scheduling algorithm Frequently Access First (FAF) which effectively organize XML data on wireless channels....

متن کامل

Site Regression Biplot Analysis for Matching New Improved Lentil Genotypes into Target Environments

Abstract The evaluation of the yield stability of genotypes and environment is of prime concern to plant breeders. Therefore, a comprehensive analysis of the structure of the GE interaction is needed. The objective of this investigation was to evaluate the use of sites regression (SREG) GGE methodology to stratify the pe × environment (GE) interaction in lentil. Yield data of 10 genotypes of le...

متن کامل

مرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشه‌بندی

With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2011